MSE loss is one of the most common loss functions in all of machine learning. The MSE loss is given as

The constant makes it a proper mean, but it’s just a scaling constant. As such, it is sometimes altered to suit a given application (as when deriving a closed-form solution for linear regression).

MSE loss was used in Rumelhart, Hinton, and Williams (1986), the “original” backpropagation paper. It remains a top choice for regression problems in deep learning. (Notice that minimizing the sum of squares over a linear predictor is equivalent to computing the linear regression.)

It is not as common for classification tasks, where we are estimating an unknown probability mass function based on a (potentially inadequate) sample. In these cases, it is often preferable to incorporate an explicit measure of information captured, so cross-entropy loss is generally preferred.